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Abstract This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport 
and its applications, in the special case of one-dimensional and circular distributions. More precisely, we 
study the Monge-Kantorovich distances between discrete sets of points on the unit circle S , in the case 
where the ground distance between two points x and y is defined as h(d(x, y)), where d is the geodesic 
distance on the circle and h a convex and increasing function. We first prove that computing a Monge- 
Kantorovich distance between two given sets of pairwise different points boils down to cut the circle at a 
well chosen point and to compute the same distance on the real line. This result is then used to obtain a 
metric between ID and circular discrete histograms, which can be computed in linear time. A particular 
case of this formula has already been used in [RDG09] for the matching of local features between images, 
involving circular histograms of gradient orientations. In this paper, other applications are investigated, in 
particular dealing with the hue component of color images. In a last part, a study is conducted to compare 
the advantages and drawbacks of transportation distances relying on convex or concave cost functions, 
and of the classical L 1 distance. 

Keywords Optimal mass transportation theory • Earth Mover's Distance • Circular histograms ■ Image 
retrieval 
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1 Introduction 

The theory of optimal transportation was first introduced by Monge [Mon8 1] in its Memoire sur la theorie 
des deblais et des remblais (1781) and rediscovered by Kantorovich [Kan42] in the late '30s. The Monge- 
Kantorovich problem can be described in the following way. Given two probability distributions / and g 
on X and c a nonnegative measurable cost function on X x X, the aim is to find the optimal transportation 
cost 
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where iT(/, g) is the set of probability measures on X x X with marginals / and g (such measures are 
called transportation plans). The existence, uniqueness and behavior of optimal transportation plans has 
been thoroughly studied in the last decades [Vil03, ACB+03, McC99, Vil08, McC95, GM96]. 

This framework is nowadays widely used in many fields of research, such as cosmology [FMMS02], 
meteorology [C11IO6], fluid mechanics or electromagnetic (see [ACB + 03] for a complete review). 

The use of the Monge-Kantorovich framework in image processing and computer vision has been 
popularized by Rubner et al. [RTGOO] for image retrieval and texture classification with the introduction 
of the so-called Earth Mover's Distance (EMD). Although the definition of the EMD is slightly different 
from the original Monge-Kantorovich formulation, these are equivalent when considering distributions 
having the same total weight. In the following years and up to now, a large body of works has relied on the 
use of such distances for image retrieval, see e.g. [GDROO, LCL04, LZLM05, ZWG06, HGS08, PW09]. 
This extensive use of transportation distances is largely due to their robustness when comparing his- 
tograms or discrete distances. For the same reason, these distances are also successfully used to compare 
local features between images, see [LO07, PW08, RDG08, RDG09, PW09]. Other uses of transporta- 
tion distances for images include: image registration [HZTA04], image morphing [ZYHT07] or junction 
detection [RT01]. 

The strongest limitation of transportation distances is their computational cost. Standard approaches 
quickly become intractable when dealing with a large amount of data in dimensions more than two. In- 
deed, the simplex algorithm, interior point methods or the Hungarian algorithm all have a complexity of 
at least 0(N 3 ) (N being the size of the data, either the number of samples or the number of histogram 
bins). Therefore, several works have proposed to speed up the computation or the approximation of op- 
timal transport, in particular in the field of image processing, where the amount of data is often massive, 
see [IT03, GD04, LO07, SJ08]. One particular case in which the computation is elementary and fast is the 
case of one-dimensional histograms, for which it is well known that optimal transport, in the case of a con- 
vex cost function, reduces to the pointwise difference between cumulative distribution functions [Vil03]. 
A question that arises is then the possibility to perform such simple and efficient computations in the case 
of circular histograms, i.e. histograms in which the first and last bins are neighbors. 

Indeed, circular histograms are especially important in image processing and computer vision. First, 
the local geometry is often efficiently coded by the distribution of gradient orientations. Such represen- 
tations offer the advantage of being robust to various perturbations, including noise and illumination 
changes. This is in particular the case for the well known SIFT [Low04] descriptor and its numerous vari- 
ants. In such a situation, the comparison of local features reduces to the comparison of one-dimensional 
circular histograms. Other local features involving circular histograms include the so-called Shape Con- 
text [BMP02]. Second, the color content of an image is often efficiently accounted for by its hue, in 
color spaces such as HSV or LCH. In such cases again, information is coded in the form of circular 
one-dimensional histograms. Several works in the field of computer vision have explicitly addressed the 
use of transportation distances in the case of circular histograms, either using thresholded concave cost 
functions [PW08, PW09] or L 1 cost functions [RDG08, RDG09]. 

The goal of this paper is first to give a general formulation of transportation distances when the cost 
function is a convex function of the Euclidean distance on the circle. This formulation gives a practical 
way to compute distances in linear time in this case. Second, we provide various experiments of image 
manipulation or retrieval for which the interest of circular transportation distances is shown. Eventually, 
we conclude with a discussion (that actually applies to both circular and non-circular cases) on the re- 
spective interest of transportation distances with either convex or concave cost functions when compared 
to classical bin-to-bin distances. It is shown that the choice between these three family of distances should 
essentially be driven by the type of perturbation the histograms are likely to suffer from. 

Outline The paper is organized as follows. In Section 2, the optimal transportation flow of the Monge- 
Kantorovich problem is investigated in the circular case. The definition of this problem being recalled, 
a new formula is introduced and a sketch of the proof is proposed (details of the proof are provided 
in appendix A). In section 3, several applications are studied to show the interest of such a metric for 
image processing and computer vision. First, an application to hue transfer between images is proposed 
in § 3.1 as a result of an optimal transportation flow on the circle. Then, in § 3.2, some applications of 
histogram comparison for image retrieval are proposed. Eventually, the discussion about the robustness 
and the limitations of Monge-Kantorovich distances in the framework of histogram comparison is given 
in Section 4. 
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2 The Monge-Kantorovich transportation problem on the circle 

In this section, we present some results on the Monge-Kantorovich distances between two circular his- 
tograms. In particular, we give an analytic formulation of these distances when the ground cost between 
points on the circle can be written as an increasing and convex function of the Euclidean distance along 
the circle. 



2.1 Definitions 

Consider two discrete and positive distributions 

N M 

f = ^2 f^ Sxi and 9 = J2 9\J] 5 Vj I ( 2 ) 

i=l i=l 

where {xi, . . . xn} and {yi, . . . i/m} are two sets of points on a subset Q of R . Assume that these 
distributions are normalized in the sense that X^iLi /[*] = 12jL d\J] = 1- Let c : Q x fi n- R + be a 
nonnegative cost function (called ground cost), the quantity 

N M 

MK C (/,£/):= min V V (Xi,jc(xi, yj), with (3) 

M = {(mj) G R N x R M ; aij > 0, Y^aij = g\j], = /[<]}, (4) 

« j 

is called the optimal transportation cost between / and g for the ground cost c. Matrices (ck»,j) in .M 
are called transport plans between / and g. If (a*,^) is optimal for (3), we say that (aij) is an optimal 
transport plan. 

Let d be a distance on Q and assume that the ground cost can be written c(x, y) = d(x, y) x , with the 
convention d(x, y) — l x ^y It can be shown [Vil()3] that 

• when A G [0, 1[, MK C is a distance between probability distributions ; 

• when A G [1, oo[, MKa (/, g) := (MK C (/, g)) J is also a distance between probability distributions. 

These distances are called Monge-Kantorovich distances, or Wasserstein distances. For A = 1, MKi is 
also known as the Kantorovich-Rubinstein distance, or in computer vision as the Earth Mover's Distance 
(EMD), as introduced by Rubner in [RTG00]. 

Computing optimal transportation costs is generally not an easy task. The main exception is the case 
of the real line: if fi = R, and if the cost c is a convex and increasing function of the euclidean distance 
\x — y\, then the optimal transport plan between / and g is the monotone rearrangement of / onto g, 
which sends the mass starting from the left. This result is usually false if c is not a convex function of the 
euclidean distance on the line. 

In the following, we take interest in the case where 1? is a circle S 1 of perimeter 1, and where c is an 
increasing function of the geodesic distance d along the circle. In particular, we will see that the previous 
result on the line can be generalized in this case. 

2.2 Optimal transportation for convex functions of the distance 

The main result of this section is an analytic formulation of the optimal transportation cost between the 
discrete distributions / and g on the circle S 1 when the ground cost c can be written c(x, y) — h(d(x, y)), 
with h : R —¥ R + an increasing and convex function and d the geodesic distance along the circle. In 
the following, we use the same notations for points on the circle and their coordinates along the circle, 
regarded as variables taking their values on the reduced interval [0, 1[ (modulo 1). It follows that d can 
be written 

d(x,y)=vom{\x-y\,l-\x-y\). (5) 
The distributions / and g on S 1 can be seen equivalently as periodic distributions of period 1 on R. 
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Let us define the cumulative distribution function of / on [0, 1[ as 



Vye[0,l[, F(y)=Y,m-Hx i e[o,y[} • 



(6) 



i=l 



F is increasing and left continuous, and can be extended on the whole real line with the convention 
F(y + 1) = F(y) + 1. This boils down to consider / as a periodic distribution on R. We define also the 
pseudo-inverse of F as F~ 1 (y) — inf{£; F(t) > y}. The interest of these definitions lies in the next 
result. 

2.2.1 An analytic formulation of optimal transportation on the circle 

Theorem 1 Assume that d is given by Equation (5) and that the ground cost c can be written c(x, y) — 
h(d(x, y)), with h : R — > R + an increasing and convex function. Let f and g be two discrete probability 
distributions on the circle, with cumulative distribution functions F and G, and let G a denote the function 
G — a. Then 



Idea of the proof This result is a generalization of the real line case, where it is well known [Vil03] that 
the global transportation cost between two probability distributions / and g can be written 



A proof of Equation (7) in a continuous setting has been proposed very recently in [DSS10], where it is 
shown that this equation holds for any couple of probability distributions. However, this proof involves 
some complex notions of measure theory which are not needed in the discrete setting. For the sake of 
completeness and simplicity, we provide in Appendix A a simpler proof of these theorem in the case 
of discrete distributions. The proof first focus on the case where / and g can be written as sums / — 
"p 2~2k=i an d 9 = ~F 2~^X=i &Vk ) where {x±, . . . xp} and {y±, . . . yp} are discrete sets of points on 
the unit circle S 1 . When the points are all pairwise different, we show that the circle can always be "cut" at 
some point, such that computing the optimal transport between / and g boils down to compute an optimal 
transport between two distributions on the real line (see Figure 1). This result is proven first for strictly 
convex functions h and for any optimal transport plan, then for any convex function h and a well chosen 
optimal plan. Once the problem has been reduced to the real line, Formula (7) follows from the fact that 
the optimal transport on R is given by the ordering of the points. The generalization of this formula to 
any kind of discrete distribution results from the continuity of the global transport cost MK C (/, g) in the 
values of the masses and their positions on the circle. 




(7) 




(8) 




Fig. 1 When the distributions are sums of unitary different masses, and when the ground cost is a nonnegative, increasing and 
convex function of the distance along the circle, there is a (non-necessarily unique) "cut" on the circle such that the optimal 
transportation on S 1 boil down to optimal transportation on the real line. 
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In practice, Formula (7) can be computed for any h at a precision e with a complexity in 0((N + 
M) log i) [DSS 10], where N and M are the number of points in the distributions / and g {i.e. the number 
of masses in the distributions). 

2.2.2 The case c(x, y) — d(x, y) 

If h is a power function ih>i*, with A > 1, Theorem 1 gives us a way to compute Monge-Kantorovich 
distances between / and g: 

MK A (f,g) = ( inf / l^" 1 - (G Q ) _1 | A ^ * . (9) 
1 (i.e. when the ground cost c is the distance d along the circle), this result can be 

MKi(/, 9 )= inf f \F-G-a\. (10) 

Observe that an alternative proof of Equation (10) was proposed by Werman et al. in [WPMK86] for 
distributions written as sums of unitary masses. A similar result is shown in [CM95] for the Kantorovich- 
Rubinstein problem, which is known to be equivalent (see [Vil03], chapter 1) to the Monge-Kantorovich 
problem when the cost c{x, y) is a distance, which is true for A = 1 (but false for A > 1). 

In practice, since F — G is piecewise constant for discrete distributions, the infimum of Equation (10) 
can be computed easily by computing the weighted median of the (finite number of) values F(t) — G(t) 
when t G [0, 1[, the weights being the lengths of the intervals on which F — G is constant. In practice, this 
yields a O(N) exact algorithm to compare two normalized distributions of N masses [CM98, Gur90]. 



In the case A = 
rewritten 



2.2.3 Discrete histograms 

Most applications deal with discrete histograms, i.e. discrete distributions living on a uniform grid of N 
bins. In the case of histograms on the real line, for the cost c(i,j) = \i — j\, Formula (8) becomes 

EMD(/, ff ) = MK! (/,<?)= \\F-G\\i, (11) 

where we denote by |j.|ji the discrete L 1 norm, by F and G the cumulative histograms of / and g, and 
where EMD is the acronym for Earth Mover's Distance [RTG00]. An illustration is given in Figure 2. 

In the case of circular histograms, bins and N — 1 are neighbors. If the cost c is c(i,j) — min(|i — 
j\, N — \i — j\) along the circle, Formula (10) can be rewritten 

N-l 

CEMD (f,g) := MKi (/,<?) = inf V \F\i] - G\i] - a\ = \ \F - G - /x| |i , (12) 

a z — J 

i=0 

where /x is the median of the set of values {F[i] — G[i] ,0 < i < N — 1}, and where CEMD is the acronym 
for Circular Earth Mover's Distance, as defined in [RDG09]. Indeed, it is easily checked that the distance 
defined by Formula (12) is equivalent to the distance introduced in [RDG09], that is 

CEMD (/, g) = min \\F k - G k \\ U (13) 
fce{o,...,JV-i} 

where F k [i] is denned as F[i] - F[k] if i G {k, . . . N - 1} and F[i] - F[k] + 1 if i G {0, . . . k - 1} 
(the definition being similar for Gk by replacing / by g). In other words, the distance MKi (/, g) is the 
minimum in k of the L 1 distance between F k and Gfc, the cumulative histograms of / and g starting at 
the k th quantization bin. 
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Fig. 2 Using c(x, y) = \x — y\ on the real line, the optimal transportation plan between two discrete histograms / and g is 
the L 1 distance of the difference between the cumulative histograms F and G (Formula (11)). 

2.3 Optimal transportation for concave functions of the distance 

In practice, it may be useful to choose the ground cost c as a nonnegative, concave and increasing function 
h of the ground distance d. For instance, for the task of image retrieval, several authors [RTGOO, HGS08, 
RT01]) claim that optimal results can be achieved with a function 

h(t) = l-e~-. (14) 

Notice that if h is increasing, concave and such that h(0) = 0, it is easy to show that h(d) is also a distance, 
and thus MK C is also a distance between probability distributions. Another property of concave costs is 
that they do not move the mass which is shared by the distributions [Vil03]: if / and g are histograms, 
the problem is reduced to the comparison of (/ — g).lf- g > with (g — f).lf- g<0 , which have disjoint 
supports. 

However, in the case of such concave functions h, Theorem 1 does not apply, and there is no general 
and fast algorithm to compute corresponding optimal transportation costs, either on the real line or on 
the circle. In most cases, we are reduced to use linear programming, i.e. simplex or interior point algo- 
rithms, which are known to have at best a 0(N 3 log N) complexity to compare two histograms on N 
bins [BDM09]. We describe in the following some special cases of concave function h for which this 
complexity can be reduced. 

2.3.1 L 1 as a Monge-Kantorovich distance 

If the distributions / and g are discrete histograms on N bins, and if h(t) = lt^o> then the Monge- 
Kantorovich distance between / and g is [Vil03] 

1 N 

MK lj(l , s) , (/, 9 ) = - J2 l/M " am = 11/ - slli. (15) 

i=l 

In other words, the L 1 distance between two normalized histograms / and g is a Monge-Kantorovich 
distance for the concave function lt^o- 

2.3.2 Thresholded distances 

In [PW08, PW09], Pele and Werman consider thresholded ground distances, using h(t) — min(i, T), 
with T a given threshold. Up to a multiplicative factor, this function h can almost be seen as a discrete 
version of (14), where r is chosen proportional to T. They show that in this case, the computation of the 
optimal cost can be solved by a "min-cost-fiow algorithm", whose complexity is smaller than classical 
linear programming algorithms. In their experiments, they use T — 2 for comparing histograms on N 
bins, which means that the cost c(i,j) can take only three values: if i = j, 1 if i and j are neighbors, 
and 2 in other cases. Since all the shared mass remains in place, we know that if (ctij) is the optimal 
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transport plan between / and g, then for a given i, Y^j^a ot-i,j = (/[*] — s[*])l/[i]>g[i]> which implies in 
particular that 

N 

MK min(dj2 )(/,g) = ^((^2ai,j) ~ «M+i - oa,i-i) 

N N 

= 2 ^(/W - g[i])l m > g[ i\ +ati,i-i) 

i=l i = l 

N 

= 11/ - Sill - ^2{ai,i+i + Ot,i-i). 
i=l 

Now, notice that a^i+i is different from only if f[i] > g[i] (otherwise, all the mass in i stay in place) 
and f[i + 1] < g[i + 1] (otherwise the mass g[i + 1] is already "filled" by a part of f[i + 1]). In other 
words, the only points where the quantities aj,j_i or a^i+i are different from are the points where the 
densities of / and g are crossing. It follows that in many cases, the thresholded distance MK min ( d 2 ) is 
very close to L , in particular when N is large and when histograms are crossing at only a few places, 
as we will see in the experiments of Section 4. In order to allow larger ground displacements, the use of 
values of T larger than 2 is proposed in [PW09]. This is made at the price of a non-linear complexity and 
necessitates a compromise in the tuning of T (smaller values yield faster computations). We will come 
back on the use of such concave cost functions in Section 4. 

In the following section, we illustrate the interest of circular Monge-Kantorovich distances for several 
applications. 



3 Experimental study 

This section is devoted to an experimental analysis of the previous optimal transportation framework for 
different computer vision tasks. We first consider the color transfer from one image to another as an 
application of the optimal transportation flow on the circle between hue histograms. In the following, 
two different applications of transportation distances for the comparison of histograms are studied: local 
feature comparison and image retrieval. 



3.1 Hue transfer between color images 

The aim of this paragraph is to use the optimal transportation framework to transfer a hue distribution 
from one image to another. 

Contrast transfer First, let us recall that histogram equalization and more generally histogram specifica- 
tions are merely particular cases of optimal transportation on the real line. Indeed, if u : Q i— > {0, . . . , L — 
1} is a discrete image and h u its gray level distribution, histogram specification consists in finding the 
optimal transport plan between h u and a target discrete probability distribution ht on {0, . . . , L— 1} (one 
speaks of histogram equalization when ht is a constant distribution on {0, . . . , L — 1}). If one considers 
a cost c equal to the euclidean distance on the line, then, as explained in Section 2.1, the solution of this 
problem consists in a monotone rearrangement. This rearrangement is obtained by applying the function 
if t -1 o H u to u, where H u (resp. H t ) is the cumulative distribution function of h u (resp. h t ) and H^ 1 
represent the pseudo-inverse of Ht (see definition in section 2.2, after Formula (6)). If u is a color image, 
such contrast adjustments can be applied to its "intensity" channel {e.g. the channel "Value" in the HSV 
representation). 

Hue transfer Thanks to Formula (12) or (13) (Section 2.2.2), one can extend the previous framework to 
hue distributions, seen as circular distributions. Following Equation (12), the optimal mapping between 
the hue distribution h u of an image u and the target hue distribution ht is obtained as {Ht — a) -1 o H u , 
where a is the median of the values {H u [i\ — H t [i]}. Figure 3 illustrate this transfer of hue on a pair of 
images. For a detailed survey on color transfer between images, we refer the reader to [PKD07]. 
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Fig. 3 Hue transfer between images. First row: original images. Second row: the hue channel of each image has been 
modified by applying the circular optimal transportation flow between the hue channels (see text for details), while other 
channels ("saturation" and "value" in HSV representation) are kept unchanged. 

3.2 Image and histogram comparison 

In this section, we investigate the interest of Monge-Kantorovich distances for histogram comparison. 
More precisely, the distance considered in the following experiments is the MKi distance (given by Equa- 
tion (12)) on the circle. Following [RDG09], we refer to it as CEMD (Circular Earth Mover's Distance). 
This distance is compared in particular to the L 1 distance, which can be described as being bin-to-bin, 
since it compares bins having the same index. 

3.2.1 Local feature comparison 

Many computer vision tasks rely on local features (object recognition, image retrieval, indexation and 
classification, image mosaicing, etc). Some of the most commonly used (and invariant) local features 
are the Shape Context [BMP02] and SIFT descriptors [Low04], which both encode periodic data: polar 
orientation for the former, and gradient orientation for the latter. For instance, a SIFT descriptor can be 
seen as a collection of M one-dimensional and circular-histograms, each one being constructed from a 
subpart of a localization grid in the image domain [Low04]. 

In [RDG09], it is demonstrated that the Circular Earth Mover's Distance (or MKi distance on the cir- 
cle) is far more robust than classical bin-to-bin distances (L 1 and L 2 metric, % 2 distance, etc) to compare 
SIFT descriptors. In particular, it is underlined that this cross-bin distance is more adapted to two kinds 
of perturbations 

• quantization effects (also know as the "binning problem"), which result from the small number of 
samples used to build the discrete histograms of gradient orientations, but also from the localization 
grid used to extract histograms over the pixel grid; 

• shifts in histograms, resulting from geometrical deformations in the image (e.g. perspective effects, 
which typically arise when an object is seen from different points of view). 
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Note that this result is consistent with the one presented in [PW08], where the metric used to compare 
SIFT descriptors is a modification of the Monge-Kantorovich distance with a truncated cost c. A more in 
depth analysis of the general robustness of Monge-Kantorovich distances for both quantization and shift 
effects is proposed in section 4. 

3.2.2 Three experiments on color image retrieval 

For the task of color image retrieval, numerous studies have shown that the Earth Mover's Distance 
(defined in Section 2.1) often achieves better retrieval performances than bin-to-bin distances [RTGOO, 
GDROO, RT01, Dvi02, LCL04, LZLM05, ZWG06, HGS08, PW09]. In order to illustrate the advantages 
of the Circular Earth Mover's Distance in the same context, we rely on hue distributions to perform image 
retrieval on a color image dataset. The dataset 1 contains 14 category of 9 pictures of the same object, with 
various camera settings (sensitivity, with or without flash, white balance reference, exposure time, etc). 
Nine pictures of the same category are shown as an example in Figure 4. Each of the P = 14 x 9 = 126 
images of the dataset is described by a hue (channel H of HSV representation) distribution, built on 
N = 360 bins. For a given dissimilarity measure D, the retrieval experiment consists in using an image 
of the dataset as a query and finding the r most similar images for D. For each value of r, we compute 

• the recall, which is defined as the average, when the query spans the database, of the ratio between 
the number of correctly retrieved images among r and the size of the query class; 

• the precision, which represents the average on the whole database of the rate of true positives among 
the r most-similar images. 

The curves (r, recall) and (recall, precision) are drawn on Figure 5, for three different dissimilarity mea- 
sures: CEMD (Equation (12)), non circular EMD (Equation (1 1)), and the L 1 distance. 




Fig. 4 Example of a category of 9 pictures extracted from the image database used for image retrieval (results ate shown in 
Figure 5). These photographs represent the same scene under various illumination conditions and camera settings. 




(a) Average Recall rate according to the rank (b) Average precision-recall curve 



Fig. 5 retrieval of a color image database. The performances curves are obtained using CEMD in red continuous line, L 
distance in blue line and also EMD (non circular mass transportation) in red dashed line. 



the image dataset is available at the following address: http://perso.telecom-paristech/~rabin/database/ 
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In this experiment, the results of CEMD (in red continuous line) clearly outperforms those of L 1 and 
EMD (which does not take into account the circularity of hue histograms). As in the experiments on 
local feature comparison, one can guess that the superiority of Monge-Kantorovich distances are due to 
their natural robustness to shifts in the distributions. However, observe that this time, in contrast with 
the application to local descriptors, a huge number of samples (~ 5.10°) is used to build each circular 
histogram, so that we avoid the aforementioned binning problem. 



Two more color retrieval experiments In this paragraph, we aim at showing that two different classes of 
intraclass variability could arise when representing data by histograms, and that bin-to-bin and cross-bin 
distances behaves very differently according to these perturbations. This fact will then be discussed in 
more detail in Section 4. 

In order to illustrate these phenomena, a small dataset 2 of 22 photographs has been used, shown 
in Figure 6. Here, we propose to reproduce -in a synthetic way- the color image retrieval experiment 
presented in section 3.2.2. For each picture of this dataset, synthetic modifications are proceeded in order 
to simulate two types of perturbations which naturally arise when considering color image retrieval: 

• Gamma correction with a power factor varying from 0.6 to 2.4 (this operation has been realized on 
the "Luminance" channel in CIELab color space). An example is shown in Figure 7(a); 

• White balance correction with a "color temperature" varying from 4400 to 6200° K (Example is 
given in Figure 8(a)). 



f|iillI^iEi£li|:|ii;t§liifc 

Fig. 6 22 pictures used for image retrieval test (results are shown in Figures 7 and 8). 



Now, applying these modifications to the dataset, we obtain two different databases on which a re- 
trieval is performed using L 1 and CEMD metric (like in section 3.2.2). Results are shown in Figure 7 for 
gamma correction, and in Figure 8 for color temperature correction. 




the image dataset is available at the following address: http://perso.telecom-paristech/~rabin/database/ 



Transportation Distances on the Circle and Applications 



11 



Once again, in the case of gamma correction (Figure 7) CEMD provides in average better retrieval 
results than L 1 distance. The main reason in such case is that we observe some intraclass shifts between 
histograms, for which cross-bin distances such as CEMD are more robust than bin-to-bin distances. 




Now, in the case of color temperature modification (Figure 8), one observes the following result: L 1 
distance provides better retrieval scores than CEMD. An examination of the results has led us to observe 
that, in such a case, the intraclass variability results this time from differences of weight of dominant 
modes in histograms (see Figure 9(b) for an illustration). 




(a) Illustration of histogram shift (b) Illustration of histogram weight variability 



Fig. 9 Illustration of the two main classes of perturbations involved in retrieval performances: intraclass shift variability (to 
the left) and intraclass weight variability (to the right). 



In order to understand the implications of these results, a discussion is proposed in the following 
section. 



4 Is it worth using transportation distances to compare histograms ? 

Following the last two experiments of the previous section (gamma correction and color balance), this 
section provides a discussion on the relative advantages of Monge-Kantorovich distances using convex 
cost functions, those using concave cost functions, and bin-to-bin distances. The discussion is not specific 
to the circular case and will be made from non-circular synthetic examples. 

Writing as before d(x, y) for the geodesic distance on the circle, we consider the following distances: 
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• the L bin-to-bin distance, 

• Monge-Kantorovich distances with concave cost functions: 

o MKexpr defined from Formula (3) when using the exponential cost function c(x,y) — 1 — 
exp(-^l) 

o MKtt- defined from Formula (3) when using a thresholded cost function as introduced in [PW08, 
PW09], that is, c(x, y) = min.(d(x, y), r) (see Section 2.3.2). 

• Monge-Kantorovich distances with convex cost functions: MKa = (MK C )~, with MK C the quantity 
defined by Formula (3) when using a cost fonction c(x, y) = d(x, y) x , for A > 1. 

Recall that among these distances, only L , MKa and MK-r 2 can be computed in linear time. Observe 
also, following the remarks of Section 2.3.2 on the proximity between MKx2^nd L , that these distances 
in a sense produce a complete range of alternatives between bin-to-bin distances (such as L 1 ) and Monge- 
Kantorovich associated with highly non-convex cost functions (e.g. MK3). This fact will be quite clear in 
the following synthetic experiments. 

These experiments consist, in order to study the assets of the various distances, to perform retrieval 
from synthetic histograms (mixture of two Gaussians) in the presence of two types of perturbations: 
shifts in the positions of bins on the one hand, and variation in the weight of bins on the other hand (see 
Figure 9). Observe that these two types of perturbations correspond to the ones encountered at the end of 
Section 3.2.2. 

We assume that elements to be compared belong to two classes A and B, and that each element is 
represented by one ^-bins histogram. We model the histograms as the mixture of two Gaussians. Writing 
c G {A, B} for the class, these two Gaussians have weights p c and (1 — p c ), means /if and /i§, and 
standard deviations erf and er£ (see Figure 10). In the following experiments, parameters are set as follows 

• Histogram construction Quantization of histograms: N = 100 bins; Number of samples for Gaussian 
mixture data generation: 1, 000 samples in [0, 1]; Number of histograms per class: 1, 000 histograms. 

• Gaussian mixture parameters Weights: p A = 0.6 and p B — 0.8; Means: ^2 = fJ,§ = 0.2 and 
^2 — ^2 — 0.7; Standard-deviations: 0-5 4 = erf = 0% = erf = 0.05. 




This generative model being chosen, two different kinds of variability can now be simulated to eval- 
uate the robustness of transportation distances depending on the cost function 3 (see Figure 9). 

Histogram shift We introduce random shifts in the histogram by modeling the means /if as random vari- 
ables. We choose /if 1 = 0.2 + e M , where e M is uniformly drawn in [—0.1; 0.1]. Some of such generated 
histograms are superposed in Figure 11(a). The precision-recall curves resulting from this two-class re- 
trieval problem are plotted for different metrics in Figure 11(b). One first observes that distances MKa, 
relying on convex cost functions, give the best results, the larger A the better. Second, it can be seen that 
transportation distances with concave cost function yields less efficiency. First are distances relying on 
an exponential cost. Eventually using transportation distances with truncated L 1 distances provides poor 
results, similar to those obtained with the L 1 distance. This fact is in agreement with the analysis made in 
Section 2.3.2. 

3 The Earth Mover's Distances with exponential and truncated cost functions have been computed using the code kindly 
provided by Y. Rubner [Rub]. 



Transportation Distances on the Circle and Applications 



13 




(a) Histogram shift variability experiment (b) Precision-Recall curve 

Fig. 11 Two-class retrieval problem with intraclass shift variability. The effect of the perturbation on histograms is 
shown in Figure 1 1(a). The Precision-Recall curves are displayed in Figure 1 1(b) for several transportation distances: MKy T 
refers to as the transportation distance with truncated cost function according to the threshold t G {2, 10}, MK cxpT cor- 
responds to the transportation distance with exponential cost function using parameter r € {1,2,5}, and MKj is the 
Monge-Kantorovich distance with A £ {1, 2, 3}. In addition is shown the curve obtained with L 1 metric, which is equivalent 
to MK T i (see §2.3.1). 

Histogram weight variability In the second experience, intraclass weight variability are now simulated 
by modeling weights as random variables: pi — 0.6 + e p , where e p is uniformly drawn from [—0.1; 0.1]. 
Some of such generated histograms are superposed in Figure 12(a). The precision-recall curves resulting 
from this two-class retrieval problem are plotted for different metric in Figure 12(b). One observes that 
with this kind of perturbation, transportation distances with L 1 cost function are less robust than the L 1 
distance. This time, it can be seen that distances with concave cost function yield better retrieval perfor- 
mances. Using thresholded cost functions again provides results that are very similar to those obtained 
with the L 1 distance. In the meantime, distances relying on exponential cost functions are still half-way 
between convex cost functions and thresholded cost functions. 

It therefore appears that higher robustness to one type of perturbation yields poorer robustness to the 
other type. There is a logical tradeoff between robustness to shifts and weight variability. In this context, 
and given that it may be computed in linear time, the MKi distance appears as a good compromise in 
term of computational cost and robustness to the two kinds of variability considered here. 

5 Conclusion 

In this paper, the optimal mass transport problem on the circle has been addressed in the case of convex and 
increasing cost function of the geodesic distance on the circle. We have proposed a new formulation (and 
a proof) for estimating the corresponding Monge-Kantorovich distances. In the particular case where the 
cost function is the geodesic distance on the circle, it has been shown that the transportation distance MKi 
between circular histograms (also referred to as CEMD, standing for Circular Earth Mover's Distance) can 
be deduced by a very simple Formula (12) which is computed in linear time. 

Then, several applications in this framework has been studied (hue transfer, local features compar- 
ison and color image retrieval), exploiting both the optimal transportation cost between histograms but 
also the corresponding optimal flow. Other applications could also benefit from the CEMD metric, such 
as shape recognition based on circular descriptors (see e.g. character recognition with orientation his- 
togram [CS02], and curvature based descriptor along closed contour [Mok97, JWR06]). 

In the last section, a comparative analysis of transportation distances with different cost functions has 
been proposed, considering two types of perturbations which arise with histogram representation: mean 
and weight changes of dominant modes. We have demonstrated that there is a tradeoff between these two 
phenomena when using either convex or concave cost functions. Eventually, the proposed CEMD metric 
offers an interesting compromise between these two choices, while being easy to use. 
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(a) Histogram weight variability experiment (b) Precision-Recall curve 

Fig. 12 Two-class retrieval problem with intraclass weight variability. The effect of the perturbation on histograms is 
shown in Figure 12(a). The Precision-Recall curves are displayed in Figure 11(b), plotted for different transportation dis- 
tances. 
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A Appendix: Proof of Theorem 1 

This appendix provides a complete proof of Theorem 1 in the case where / and g are discrete distributions 
(as written in Equation (2)). We first prove this theorem for distributions composed of unitary masses, and 
conclude thanks to continuity arguments. 



A.l Introduction 

Consider two discrete sets of points {x±, . . . xp} and {y±, . . . yp} on the unit circle S , and the corre- 
sponding discrete distributions 

1 P 1 P 

f = p Yl 5x *' and # = p Yl (16) 

fc=i fe=i 

where the notations x k , y k are used equally for points on the unit circle or for their coordinates in [0, 1[. 
Let d be the geodesic distance along the circle (given by Equation (5)) and assume that c can be writ- 
ten c(x,y) — h(d(x,y)) with h a nonnegative, increasing and convex function. It is well known (this 
is a consequence of Birkhoff's theorem, see for example the introduction of [Vil03]) that the optimal 
transportation cost between / and g equals 

MK c (/,ff)= mm W£(f,g), with W% (f,g) := V c(x k ,y a (k)) = V h(d(x k ,y„(k))), 

(17) 

where Up is the set of permutations of {1, . . . P}. In other words, finding the optimal transportation 
between / and g boils down to find the optimal permutation a between the points {x^} and {yj}. 

A. 1.1 Paths 

If x and y are two different points of S , we note j(x,y) the geodesic path linking x and y on S 1 (the 
path is supposed open: it does not contain x and y). This path is always unique except in the case where 
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x and y are in opposite positions on the circle. In this case, we choose j(x,y) as the path going from 
x to y in the trigonometric direction. A path j(x, y) is said to be positive if it goes from x to y in the 
trigonometric direction. If the path goes from x to y in the opposite direction , it is said to be negative. 

A. 1.2 Cumulative distribution functions 

The cumulative distribution function of / has been defined in Equation (6). Now, on [0, 1[ seen as a the 
unit circle S , no strict order can be defined between points, which means that we can define as many 
cumulative distribution functions as there are starting points on the circle. If a; is a point in [0, 1[, the 
a; -cumulative distribution function F x of / can be defined by choosing x as the reference point on the 
circle S 1 and by summing the mass in the trigonometric order from this new reference point: 



An example of a cumulative distribution F and its corresponding accumulative distribution F x on [0, 1[ 
is shown on Figure 13. 



Fig. 13 F on the left and F x on the right. 



A. 2 Preliminary results 

In the following, we prove that if / and g can be written as in Equation (16), if the points x\, . . .Xp and 
3/i, . . . yp are pairwise different, and if a is an optimal permutation for (17), there is always a point on 
the circle which is not contained in any optimal path of a. This result is proven first for strictly convex 
functions h and for any optimal permutation a, then for convex functions h and a well chosen optimal 
permutation. 

Proposition 1 Assume that h is strictly convex. Let xi, . . . xp and yi, . . . yp be P points in [0, 1[, all 
pairwise different. Then for each permutation a of Tip which minimizes (17) , there exists k £ {1, . . . P} 
such that for all I ^ k, x^ 7(£(, 2/o-(i))- 

The proof of this proposition needs the following lemma, which describes some properties of the 
geodesic paths i(xi, 2/o-(z)) obtained when a is aminimizer of (17) and h is strictly convex. 

Lemma 1 Assume that h is strictly convex. Let a be a minimizer of (17) and let 7; = 7(2?!, Vcr(i)) an d 
7^ = 7 (xk , y a (fe) ) ( with I ^ k) be two geodesic paths for the assignment defined by a. Assume also that 
xi 7^ Xk and y a m 7^ 2/o-(fc)- Then, one of the following holds: 

• li n 7fc = ; 

• ll n 7fc 7^ and in this case 7; and 7^ have the same direction (both positive or both negative) and 
neither of them is contained in the other. 

Proof Assume that 7; n 7fc 7^ 0. If ji f) 7fc is equal to j(xi, Xk), then, since h is an increasing function 
of d, c(xi,y a (i)} > c(xk,y a (i)} and c(xk,y a (h)) > c { x UV<j{k)), which contradicts the optimality of a. 
The same conclusion holds if 7; n 7fc is equal to ^{y^m, y a (k))- Moreover, if for example the path 7; is 
included in 7^, then the strict convexity of the function h implies 



VyGR, F*(y) = F(x + y)-F(x), 



(18) 




X 



c{x u y a{l) ) + c(x k ,y<r(k)) > c{xuVa{k)) + cfcfe, J/<r(I))> 
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which also contradicts the optimality of a. Thus, 7/ PI jk is equal to "f(xi, y a (h)) or to j(xk, y a (i)) an d it 
follows that 7^ and 7/ are either both positive or both negative. 

Proof of Proposition 1 Let a be a minimizer of (17). In the following, we will denote by 7; the 
geodesic path j(xi, y a (i))- We can assume without loss of generality that the points x\, . . .xp are in 
trigonometric order on the circle. 

Assume that for each I £ {1, . . . P}, there exists q(l) 7^ I such that x\ belongs to the open path 7 9 (z). 
Then, for each I, we have J q (i) H 7/ 7^ 0, which means that the geodesic paths J q (i) and 7/ are either 
both positive or both negative (from lemma 1). Assume for instance that they are both positive and let us 
show that in this case xi G 7/-1 (with I — 1 = P if I = 0). If q(l) =1 — 1, there is nothing to prove. If 
q(l) 7^ Z — 1, it means in particular that x q ^, X1—1, xi are in trigonometric order on the circle. Since7 9 (;) 
is a positive path starting from x q ^ and containing xi, it follows that Jqm contains (recall that the 
points are assumed to be pairwise different, in particular 7^ awj)). Thus 7;_i n 7 g (/) 7^ 0, which 
implies that 7/_i is positive. Now, xi must be in 7;_i, otherwise we would have 7;_i C 7 9 (;), which 
contradicts lemma 1. Thus, if the paths 7 g (z) and 7; are both positive, xi G 7;_i. 

In the same way, if 7 9 (;) and 7; are both negative, then xi G 7;+i- In any case, for each / G {1, . . . P}, 
xi € 7/_i U 7;+i (with the obvious convention 7p+i = 71, 70 = yp). 

Now, suppose that for a given k 6 {1, . . . P}, Xk is in 7fc-i- Then, "fk-i and jk have the same 
direction. From lemma 1, it follows that x^-i cannot be contained in 7^. Since we know that x^-i G 
7fc-2 U 7fc, Xk-i must be in 7^-2- Recursively, for each I G {1, . . . P}, x\ G tj— 1. It follows that for 
each I G {1, . . . P}, d(xi, 2/ CT (/_i)) < d(xi-i, y a (i-i)), and since h is increasing 

p p 

1=1 1=1 

which contradicts the fact that a is a minimizer of (17). We come to the same conclusion if for a given 
k 6 {1, . . . P}, x k is in 7 fe+ i □ 

The same result can be proven for any convex function h with the difference that it is only satisfied 
for a good choice of the permutation a which minimizes (17), and not for all of these permutations. This 
result can be seen as a limit version of proposition 1 . 

Corollary 1 Assume that h is convex. Let x\ , . . . xp and yi, ■ ■ ■ yp be P points in [0, 1[. Assume that all 
these points are pairwise different. Then there exists a permutation a of Sp which minimizes (17) and a 
point Xk G {x 1, ... xp} such that for all I 7^ k, Xk ^ j{xi,y a (i))- 

Proof We know that for any strictly convex function h, if o~h minimizes the cost a 1— » W„ (/, g), there 
exists k G {l, . . .P} such that for all I ^ k, Xk ^ 7; = ~f(xi, ycr h (i))- 

Now, assume that h is convex (not strictly). One can always find a sequence (h n ) of increasing 
and strictly convex functions such that h n converges pointwise towards h when n — > 00. If a and the 
points xi, . . . xp,yx, . . . yp are fixed, then the finite sum W„ (/, g) := -p h n (d(xk, y CT (fc))) tends 
towards W a (f,g) = -p- Ylk h(d(xk, y CT (fc))) when n — > 00. Thus, for each e > 0, there exists an 
integer N, such that for all n > N, \ W™ (f,g) — Wa {f,g)\ < £• Since Up is a finite set, we can 
chose N large enough such that this property holds for every a in Up. We can also chose iV such that 
I mino- W a (/, g) — rahia Wa (/, <?)| < e. Now, if n > N and if a* is an optimal permutation for 
W% (f,g), it follows that 

I min Wa (/, g)-Wa- (f,g)\ < I min Wa (/, g) - min W^ (/, g) \ 

a a a 

+\Wa*(f,g)-Wa* (f,g)\ 

< 2s. 

Since Up is a finite set, the fact that this distance can be made arbitrarily small implies that when n is 
large enough, a minimizer a* of Wa(f, g) is also a minimizer of Wa{f, g). This proves that there exists 
at least one minimizer a of a 1— > Wa(f, g) such that Xk j(xi, y ax (i)) for some k G {1, . . . , P} and all 
l^k. 
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A. 3 Proof of Theorem 1 



Proof of Theorem 1 

Let us begin with the case where / and g can be written as sums of unitary masses (Equation (16)), 
and where xi, . . . xp and y\, . . . yp are pairwise different. Proposition 1 and Corollary 1 show that if the 
ground cost c can be written c(x, y) = h(d(x, y)) with h a positive, convex and increasing function, we 
can choose some optimal permutation a for which there is some point x^ which is not contained in any 
path of a (recall that paths are defined as open: they do not contain their boundaries). Since all points are 
supposed pairwise different, the only path meeting all the neighborhoods of Xk is jk- It follows that there 
exists some open set on one side of Xk and not containing Xk which does not cross any path of the optimal 
permutation a. The middle x of this open set is not contained in any path of a. We can thus cut the circle 
S 1 at x and reduce the transportation problem on the circle to the transportation problem on the real line. 
The optimal permutation a is thus given by the sorting of the points (formula (72) in [Vil03]), taking x as 
the reference point on the circle. This means that when points are pairwise different, we have 

MK C (/,<?)= inf / hQF- 1 -G- 1 ]), (20) 

where F^ 1 and G" 1 are the pseudo-inverses (pseudo-inverses are defined in Section 2.2) of the increasing 
functions F x and G x defined in Equation (6). 

Now, observe that F x and G x are horizontal translations of F — F(x) and G — G{x) by the same 
vector x. In consequence, 



/ fcflF- 1 - G" 1 1) = / h(\(F - F(x))- 1 - (G - G(x)Y 
Jo Jo 



(21) 



Since F and G have been defined on R such that for all y, F(y+1) = F(y) + 1 and G(y + 1) = G{y) + 1, 
the bounds of this integral can be replaced by any bounds (t, t + 1). It follows that 

1 h{\(F - Fix))- 1 -(G-GW)- 1 !)^ f 1 h(\(F - F(x))- 1 -(G-G^r 1 !) 

J-F{x) 

1 h^Fy 1 -{G + F{x)-G{x))- 1 \). 



Finally, 

MK c (/ l5 )= inf f ihiKF)- 1 -(G + F(x)-G(x))- 1 \). (22) 

In order to conclude, notice that the function tp : a, h-> J l/i(|(_F) _1 — (G + is continuous 

(h : R — > R + is continuous since it is convex) and coercive ( tp(a) — > +oo when \a\ — > +oo). It follows 
that tp reaches its minimum at a point ao G R. In addition, the fact that F and G are piecewise constant 
implies that tp is piecewise affine, with discontinuities of tp' at points F(x) — G(x). Thus, 

MK C (/,<?)= inf / \hi\iF)- 1 -{G + a)- 1 ]). (23) 



The previous result can be generalized to the case where the points xi, yj may coincide just by remark- 
ing that both quantities in Equation (23) are continuous in the positions of these points. In consequence, 
the result holds for distributions with rational masses. 

In order to generalize the result to any couple of discrete probability distributions, observe that the 
right term in Equation (23) is continuous in the values of the masses f[i] and g[j]. As for the continuity 
of MK C (/, g), assume that a mass e of the distribution / is transferred from the point Xi to the point Xi 1 
in /, and let us call the new distribution f e . If (a) is an optimal transport plan between / and g, let jo be 
an index such that oti a j > e. A transport plan (a') between f e and g can be defined as 

• a ij = for (*>i) ^ (*o> Jo), (h,jo). 
The corresponding transportation cost between f e and g is then lower than MK C (/, g)+eh(^), which im- 
plies that MK c (f, g) < MK C (/, g)+eh(\). Conversely, we can show that MK C (/, g) < MK c (f £ ,g) + 
eh$). ' □ 
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